skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Song, Y"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training recipe. The research community has responded by using distillation from black-box models to label training data, achieving strong benchmark results, at the cost of measurable scientific progress. However, without knowing the details of the teacher model and its data sources, scientific progress remains difficult to measure. In this paper, we study building a Perception Language Model (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from proprietary models and explore large-scale synthetic data to identify critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded video captions. Additionally, we introduce PLM-VideoBench, a suite for evaluating challenging video understanding tasks focusing on the ability to reason about "what", "where", "when", and "how" of a video. We make our work fully reproducible by providing data, training recipes, code & models. 
    more » « less
    Free, publicly-accessible full text available July 23, 2026
  2. Self-supervised learning(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data. The reason for its success is that a suitable SSL design can help the model to follow the neural scaling law, i.e., the performance consistently improves with increasing model and dataset sizes. However, it remains a mystery whether existing SSL in the graph domain can follow the scaling behavior toward building Graph Foundation Models~(GFMs) with large-scale pre-training. In this study, we examine whether existing graph SSL techniques can follow the neural scaling behavior with the potential to serve as the essential component for GFMs. Our benchmark includes comprehensive SSL technique implementations with analysis conducted on both the conventional SSL setting and many new settings adopted in other domains. Surprisingly, despite the SSL loss continuously decreasing, no existing graph SSL techniques follow the neural scaling behavior on the downstream performance. The model performance only merely fluctuates on different data scales and model scales. Instead of the scales, the key factors influencing the performance are the choices of model architecture and pretext task design. This paper examines existing SSL techniques for the feasibility of Graph SSL techniques in developing GFMs and opens a new direction for graph SSL design with the new evaluation prototype. Our code implementation is available online to ease reproducibility https://github.com/HaitaoMao/GraphSSLScaling. 
    more » « less
    Free, publicly-accessible full text available November 25, 2025
  3. We study the formation of oil droplets from an initially trapped large oil ganglion under surfactant flooding, using a microfluidic device consisting of a two-dimensional array of regularly spaced square posts. We observe that above a critical capillary number for oil mobilization, breakage of the ganglion results in the formation of either trapped patches spanning multiple pores or numerous mobile droplets that exit the device at a velocity comparable to the average flooding fluid velocity. These mobile droplets, however, are only observed when above a secondary capillary number threshold. The formation of these droplets is found to involve the simultaneous occurrence of three different passive droplet generation mechanisms where a droplet is formed as it is pulled by perpendicular fluid flow, as it is pulled by co-axial fluid flow, and or as it splits due to collision with a post. Our results show that oil breakthroughs only occur when the oil is in the form of mobile drop- lets, suggesting that droplet formation can be an important condition for the mobility of residual oil in porous media. Additionally, this post-array microfluidic device can be used for the production of monodisperse droplets whose size can be controlled by the spacing of the posts. 
    more » « less